Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(#9601): prototype duplicate prevention #9609

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ChinHairSaintClair
Copy link
Contributor

Description

A prototype to prevent duplicate hierarchy contact siblings from being created as discussed with @jkuester and @mrjones-plip in a "technical working session". We expect a lot of feedback, changes, and further discussion before it's ready for approval.

To achieve this, we hook into duplicate detection strategies through "configuration", amend the promise fired on submit in the contacts-edit.component.ts file to run our additional check, and finally output the possible duplicate items to a duplicate_info section added to the enketo.component.html file.

Configuration:

_phdcChanges: { // Additional namespace
  // Specify your own contact_types here
  hierarchyDuplicatePrevention: Partial<{[key in 'person' | 'health_center']: Strategy;}>;
  // The Partial utility ensures that only the allowed keys (health_center, clinic, person, etc) are used, but none are mandatory.
};

Where the keys should match the contact_types listed in your base_settings.json file.

Currently two strategies are available, Levenshtein and NormalizedLevenshtein, with the ability to customize properties based on implementation needs.

Example implementation:

window._phdcChanges = {
  hierarchyDuplicatePrevention: {
    health_center: {
      ...Levenshtein,
      props: [
        {form_prop_path: `/data/health_center/name`, db_doc_ref: 'name'},
        {form_prop_path: '/data/health_center/external_id', db_doc_ref: 'external_id'}
      ],
      queryParams: {
        valuePaths: ['/data/health_center/is_user_flagged_duplicate', '/data/health_center/duplicate/action'],
        // eslint-disable-next-line eqeqeq
        query: (duplicate, action) => duplicate === 'yes' && action != null
      }
    }
  }
}

Where props are the definitions that should be used to evaluate how likely the current record is to its siblings. If no props value is provided, "name" will be used by default. E.g:

  • For a "location" we might want to match on street_number, street_name, and postal_code
  • For a "person" we might want to match on name, sex, and date_of_birth.

form_prop_path is the xml path to the interested value on the currently created/edited form. db_doc_ref would the property name of the sibling database document that the resolved value of the form_prop_path should be compared to. E.g:
Suppose form has a structure of: <data><clinic><name>Test</name></clinic></data>. The path would be: /data/clinic/name. The "clinic" sibling documents would have a property of name.

Finally, we have a use case where we need to conditionally fire the duplicate check based on if our CHW has confirmed a record to be a duplicate, after our backend has flagged the item as such. We use the queryParams's valuePaths to specify the xml path of the question, or questions, that would allow us to mark a record as a duplicate, submit it, and then merge or delete it downstream based on the specified action.

Misc:

We use the CHT provided medic-client/contacts_by_parent view to query for siblings on form init. The request gets processed in the background giving larger queries the opportunity to make progress before the result gets awaited in the form submit.

@kennsippell, since we've touched on the duplicate topic before, it would be great to get your thoughts as this as well.

#Issue
#9601

Code review checklist

  • Readable: Concise, well named, follows the style guide, documented if necessary.
  • Documented: Configuration and user documentation on cht-docs
  • Tested: Unit and/or e2e where appropriate
  • Internationalised: All user facing text
  • Backwards compatible: Works with existing data and configuration or includes a migration. Any breaking changes documented in the release notes.

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.

queryParams: {
valuePaths: ['/data/health_center/is_user_flagged_duplicate', '/data/health_center/duplicate/action'],
// eslint-disable-next-line eqeqeq
query: (duplicate, action) => duplicate === 'yes' && action != null
Copy link
Contributor

@fardarter fardarter Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkuester @ChinHairSaintClair My feeling here is that this would need to end up with some kind of operator syntax support that lets the user do logic, maybe something like (though doesn't have to be) JsonLogic: https://jsonlogic.com/

The way I see it there are two point of config -- the excel files and the configuration json -- and probably neither should take serialised JS (?).

If we return the key => { key, value } as a tuple or object on key lookup, we could let the user express logic maybe like this (with the value found at the key tested against the value provided (with) using the operator):

let andOnly = {
  logic: [
    [
      // array bracket denotes a logical grouping
      {
        key1: { with: value, op: $in },
        key2: { with: value, op: $contains },
        key3: { with: value, op: $eq },
      },
      {
        key1: { with: value, op: $in },
        key2: { with: value, op: $ne },
        key3: { with: value, op: $startsWith },
      },
    ],
  ],
};

let orWithNestedAnds = {
  logic: [
    [
      {
        key1: { with: value, op: $in },
      },
      {
        key2: { with: value, op: $eq },
      },
    ],
    // each grouping in its own bracket
    [
      {
        key1: { with: value, op: $in },
      },
      {
        key2: { with: value, op: $in },
      },
    ],
  ],
};

let orWithNestedOrWithNestedAnds = {
  logic: [
    [
      {
        key1: { with: value, op: $in },
      },
      {
        key2: { with: value, op: $in },
      },
    ],
    // OR
    [
      [
        // nested groupings acceptable
        {
          key1: { with: value, op: $ne },
        },
        // AND
        {
          key2: { with: value, op: $ne },
        },
      ],
      // OR
      [
        {
          key1: { with: value, op: $contains },
        },
        // AND
        {
          key1: { with: value, op: $startsWith },
        },
      ],
    ],
  ],
};

The other stuff seems easier to move to config.

@@ -88,7 +88,14 @@ declare global {
CHTCore: any;
angular: any;
EnketoForm:any;
_phdcChanges: { // Additional namespace
// Specify your own contact_types here
hierarchyDuplicatePrevention: Partial<{[key in 'person' | 'health_center']: Strategy;}>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkuester @ChinHairSaintClair Following on from my above comment:

The way I see it there are two point of config -- the excel files and the configuration json.

We probably couldn't expect the user to do config here. I agree the type safety is lovely and I'm not sure whether it's possible to generate a type from the config, but if not I don't see how we can keep the type safety and still maintain the existing configuration contract.

@@ -299,6 +376,152 @@ export class ContactsEditComponent implements OnInit, OnDestroy, AfterViewInit {
};
}

private parseXmlForm(form): Document | undefined {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move as much of this sort of stuff as is possible to a lib/file so it can be unit tested easily and then just called internally. Not sure if private is needed here.

(I know this was just a conceptual prototype not productionised, but for forward looking feedback.)

}
}

return count > 0 ? totalScore / count : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the alternative return type need to be null or is there an appropriate default value of type int?

}

// Promise.allSettled is not available due to the app's javascript version
private allSettledFallback(promises: Promise<Exclude<any, null | undefined>>[]): Promise<ReturnType[]> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkuester @ChinHairSaintClair Would promise allSettled not be transpiled to a target version?

const $duplicateInfoElement = $('#contact-form').find('#duplicate_info');
$duplicateInfoElement.empty(); // Remove all child nodes
$duplicateInfoElement.show();
// TODO: create a template component where these values are fed into.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need DOM mutation? Can the angular template not handle it?

count++;
}
$duplicateInfoElement.append(content);
$duplicateInfoElement.on('click', '.duplicate-navigate-link', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No anon functions on event listeners please. They don't get properly cleaned up by the browser. They need to be named.


export const NormalizedLevenshtein: Strategy = {
type: 'NormalizedLevenshtein',
threshold: 0.334,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably want to set this in config

};
formService.render.resolves(form);

await createComponent();
await fixture.whenStable();

formService.saveContact.resolves({ docId: 'new_clinic_id' });

// TODO: figure out why this test's dbLookupRef is null despite being set in the beforeEach
component.dbLookupRef = Promise.resolve({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want sinon.resolves

{form_prop_path: `/data/health_center/name`, db_doc_ref: 'name'},
{form_prop_path: '/data/health_center/external_id', db_doc_ref: 'external_id'}
],
queryParams: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe formQuestion? Name is still bothering me.

health_center: {
...Levenshtein,
props: [
{form_prop_path: `/data/health_center/name`, db_doc_ref: 'name'},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camelCase is the js standard.

],
queryParams: {
valuePaths: ['/data/health_center/is_user_flagged_duplicate', '/data/health_center/duplicate/action'],
// eslint-disable-next-line eqeqeq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkuester Continuing to lobby for != as the most excellent exception to the strict equality lint rule. Checks for both null and undefined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants